Machine learning of the Bayesian belief network as a tool for evaluating the process frequency on social network data
Annotation
The paper considers the problem of evaluating frequency of the processes whose mathematical model is stochastic processes consisting of a series of sequential episodes with a known class of distributions of the length of the time interval between them. In the previously proposed approach, the input data included information about the value of the interval between the last episode and the end of the study period, which could lead to inaccurate results. This interval differs from the intervals between successive episodes, and hence its presentation and processing require approaches that take this feature into account. Accuracy of the estimation results for process frequency was improved by developing a new model based on the Bayesian confidence network that includes nodes corresponding to the intervals between the last episodes of the process, the minimum and maximum intervals between episodes, by correctly accounting for the values of the interval between the last episode and the end of the study period at the model training stage. The authors propose a Bayesian belief network that includes a random element characterizing the interval between the end of the study period and the last episode of the process during the study period; data on this interval can be available at the training stage. They used R programming and the bnlearn package to model the Bayesian belief network. A new approach to the estimation of process frequency based on the Bayesian belief network generated by machine learning methods is proposed. It allows increasing the accuracy of the results by correctly considering the value of the interval between the last episode and the end of the period under study using a special scheme in the machine learning Bayesian belief network which includes a “hypothetical” episode after the end of the study period. To test the proposed approach, data was collected on 5608 Instagram users, which included the time of posting for the year 2020 and the time of publishing the first post for the year 2021. 70 % of the sample was used to train the model, and 30 % was used to compare the posting frequency values predicted by the model with known values. The results can be used in various fields of science, where it is necessary to estimate a process frequency under information deficit, when the whole process is observed for no more than some limited time. Obtaining such estimates is often an important issue in medicine, epidemiology, sociology, etc. The approach shows good results on the comparison of the theoretical model and the results of learning from the social network data, which can automate the obtaining of process frequency estimates.
Keywords
Постоянный URL
Articles in current issue
- Features of images of water, ice, snow, objects and a human formed by a hybrid television camera in the near-infrared range
- Analyzing periodical textured silicon solar cells by the TCAD modeling
- Scintillation gamma radiation sensors based on solid-state photomultipliers in wireless industrial internet networks
- Improving the quality of network management of technological processes
- Geometric approach to the solution of the Dubins car problem in the formation of program trajectories
- Drift of two-dimensional vacancy islands on the Si(100) surface under electromigration conditions
- A study of the photocatalytic properties of chitosan-TiO2 composites for pyrene decomposition
- Kinetics of transformation of the atomic step bunches shape under electromigration conditions on the Si(001) surface
- Abnormal diffusion profile of adatoms on extremely wide terraces of the Si(111) surface
- An experimental methodology for assessing the probability and danger of network attacks in automated systems
- A meta-feature selection method based on the Auto-sklearn framework
- Automatic construction of the dialog tree based on unmarked text corpora in Russian
- Generic programming with combinators and objects
- Software restructuring models for object oriented programming languages using the fuzzy based clustering algorithm
- The concept of managing the network structure of intelligent devices in the digital transformation of the energy industry
- Protecting facial images from recognition on social media: solution methods and their perspective
- Redundant models of testable distributed real-time computing systems
- A study of the influence of the base thickness on photoelectric parameters of silicon solar cells with the new TCAD algorithms
- A balanced algorithm of the hybrid large-particle method and its verification on some test problems
- The architecture of a system for full-text search by speech data based on a global search index
- Assessment of cerebral circulation through an intact skull using imaging photoplethysmography